---
title: "Mapping Digital Campaign Strategies: How Political Candidates Use Social Media to Communicate Constituency Connection and Policy Stance"
subtitle: "Overview of datasets that cannot be shared. Due to Twitter’s Terms & Conditions, not all datasets are publicly available."
author: "James Cross, Derek Greene, Stefan Müller, and Martijn Schoonvelde"
format:
  html:
    embed-resources: true
---

```{r, message=FALSE, warning=FALSE}
library(readr)
library(dplyr)
# load predictions for policy content

dat_policy <- read_csv("../replication_computational_communication_research_data_dontshare/unseen-policy-sbert_distilrobertav1_lr.csv") |>
mutate(user_id = as.factor(user_id))

str(dat_policy)
```

**Variable Overview – `dat_policy` (n = 276,003, 22 variables)**

| Variable                          | Type    | Description                                                                                         |
| --------------------------------- | ------- | --------------------------------------------------------------------------------------------------- |
| `tweet_id`                        | chr     | Unique identifier for each tweet in the dataset, combining the account handle and numeric tweet ID. |
| `user_id`                         | factor     | Identifier for the Twitter account that posted the tweet.                                   |
| `screen_name`                     | chr     | Public-facing username of the Twitter account (without the `@` symbol).                             |
| `status_id`                       | num     | Numeric identifier assigned by Twitter to the tweet (status).                                       |
| `text`                            | chr     | Full text content of the tweet, truncated where indicated in the output.                            |
| `contains_media`                  | logi    | Whether the tweet contains any media (images, video, GIF) — `TRUE` or `FALSE`.                      |
| `n_media_tweet`                   | num     | Number of media items attached to the tweet.                                                        |
| `is_retweet`                      | logi    | Whether the tweet is a retweet (`TRUE`) or an original post (`FALSE`).                              |
| `retweet_count`                   | num     | Number of times the tweet has been retweeted.                                                       |
| `favorite_count`                  | num     | Number of times the tweet has been liked.                                                           |
| `created_at`                      | POSIXct | Date–time when the tweet was created, in UTC.                                                       |
| `party_account`                   | num     | Indicator for whether the tweet was posted from an official party account (`1`) or not (`0`).       |
| `coded_electioneering_image_text` | num     | Manual coding: presence of electioneering content in combined image + text.                         |
| `coded_policy_content_image_text` | num     | Manual coding: presence of policy-related content in combined image + text.                         |
| `coded_electioneering_image`      | num     | Manual coding: presence of electioneering content in image only.                                    |
| `coded_policy_content_image`      | num     | Manual coding: presence of policy-related content in image only.                                    |
| `coded_electioneering_text`       | num     | Manual coding: presence of electioneering content in text only.                                     |
| `coded_policy_content_text`       | num     | Manual coding: presence of policy-related content in text only.                                     |
| `tweet_coded`                     | num     | Indicator for whether the tweet has been manually coded (`1`) or not (`0`).                         |
| `prediction_policy_lr`            | chr     | Model-based classification label — `"policy"` or `"non-policy"` — generated by a transformer model. |
| `prob_policy_lr0`                 | num     | Model-predicted probability that the tweet is `"non-policy"` content, based on transformer output.  |
| `prob_policy_lr1`                 | num     | Model-predicted probability that the tweet is `"policy"` content, based on transformer output.      |




```{r}
# load predictions for electioneering content
dat_electioneering <- read_csv("../replication_computational_communication_research_data_dontshare/unseen-electioneering-sbert_distilrobertav1_lr.csv") |>
mutate(user_id = as.factor(user_id))

str(dat_electioneering)
```

**Variable descriptions – `dat_policy` (n = 276,003, 22 variables)**

| Variable                          | Type    | Description                                                                                                        |
| --------------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------ |
| `tweet_id`                        | chr     | Unique string identifier for each tweet, concatenating the account handle and the numeric tweet ID.                |
| `user_id`                         | factor     | Identifier for the Twitter account that posted the tweet.                                                              |
| `screen_name`                     | chr     | Public username of the Twitter account (without the `@` symbol).                                                   |
| `status_id`                       | num     | Numeric identifier assigned by Twitter to the tweet (status).                                                      |
| `text`                            | chr     | Full text content of the tweet; truncated in the preview.                                                          |
| `contains_media`                  | logi    | Logical flag indicating whether the tweet contains attached media (`TRUE`/`FALSE`).                                |
| `n_media_tweet`                   | num     | Count of attached media items (e.g. images, GIFs, videos).                                                         |
| `is_retweet`                      | logi    | Logical flag: `TRUE` if the tweet is a retweet, `FALSE` if it is original content.                                 |
| `retweet_count`                   | num     | Number of times the tweet was retweeted.                                                                           |
| `favorite_count`                  | num     | Number of times the tweet was liked.                                                                               |
| `created_at`                      | POSIXct | UTC timestamp of when the tweet was created.                                                                       |
| `party_account`                   | num     | Binary indicator: `1` if posted from an official party account, `0` otherwise.                                     |
| `coded_electioneering_image_text` | num     | Manual coding: presence of electioneering content in combined image + text.                                        |
| `coded_policy_content_image_text` | num     | Manual coding: presence of policy-related content in combined image + text.                                        |
| `coded_electioneering_image`      | num     | Manual coding: presence of electioneering content in image only.                                                   |
| `coded_policy_content_image`      | num     | Manual coding: presence of policy-related content in image only.                                                   |
| `coded_electioneering_text`       | num     | Manual coding: presence of electioneering content in text only.                                                    |
| `coded_policy_content_text`       | num     | Manual coding: presence of policy-related content in text only.                                                    |
| `tweet_coded`                     | num     | Binary indicator: `1` if the tweet has been manually coded, `0` otherwise.                                         |
| `prediction_electioneering_lr`    | chr     | Model-based classification label — `"electioneering"` or `"non-electioneering"` — produced by a transformer model. |
| `prob_electioneering_lr0`         | num     | Model-predicted probability that the tweet is `"non-electioneering"`.                                              |
| `prob_electioneering_lr1`         | num     | Model-predicted probability that the tweet is `"electioneering"`.                                                  |
